Discrete cosine high-speed arithmetic unit and related arithmetic unit

ABSTRACT

An arithmetic unit for carrying out partial sum of products for transform operations such as discrete cosine transform is provided which includes a plurality of first units for calculating in parallel sums of and/or differences between a plurality of input variables or sums of and/or differences between a plurality of values obtained by multiplying said plurality of input variables by a constant. The arithmetic unit also includes a processing unit having a plurality of shift units for shifting outputs from said plurality of first units by respectively predetermined numbers of digit-shifts and a plurality of second units for calculating concurrently sums of outputs from said plurality of shift units. The arithmetic can be used, for example as a high speed discrete cosine unit, a high speed Hartley transform unit or a high speed Hough transform unit.

TECHNICAL FIELD

The present invention relates to an arithmetic unit of a computersystem, and in particular, to a discrete cosine high-speed arithmeticunit suitable for achieving calculation of a sum of products using aplurality of constant function values and compressing and decompressingdata at a high speed. Moreover, the present invention relates to ahigh-speed Hartley transform arithmetic unit suitable for calculating asum of products using a plurality of constant function values andthereby executing the Hartley transform processing, which is related toa Fourier transform, at a high speed. Additionally, the presentinvention relates to image processing, and in particular, to a Houghtransform circuit to achieve a Hough transform in which straight linecomponents of an image are detected, the circuit being suitable forcalculating a sum of products using a plurality of constant functionvalues and executing the Hough transform processing at a high speed.

BACKGROUND ART

In the voice and image processing, there has been widely employed adiscrete Fourier transform (DFT) and its variations such as a discretecosine transform and a discrete Hartley transform. In these transformprocesses, a plurality of trigonometric functions are utilized toprimarily calculate sums of products between the trigonometric functionsand data items. In general, the calculation cost of multiplication ishigher than that of addition and subtraction. Consequently, there havebeen devised several high-speed calculation algorithms in which thenumber of multiplications are advantageously reduced using relationshipsbetween trigonometric functions, e.g., the formula of double angle andthe formula of half-angle. These algorithms have been briefly describedin pages 115 to 142 of the "Nikkei Electronics" No. 511 published onOct. 15, 1990. In practice, the trigonometric functions are stored asconstants in a memory. Particularly, due to the relatively small numberof figures of the values, there has been also adopted a method in whichthe results of products between data items and trigonometric functionsare stored in a memory. In addition, it is possible to utilize a knownmethod in which each trigonometric function value is calculated in aCORDIC method using the principle of rotation of coordinates and/or aformula of approximate expression of function.

In image processing, the Hough transform is often employed because thetransform is advantageously applicable even when the data containsnoises due to the detection of straight lines in the image. When thecoordinates of an arbitrary pixel are expressed as (x,y), the Houghtransform is defined as

    R=xcosθ+ysinθ=xcosθ+ycos(π/2-θ).

FIG. 23 shows the geometric relationship of the transform. R stands forthe length of a perpendicular drawn from the origin of the coordinatesystem to a straight line passing the pixel (x,y). Letter θ denotes theangle between the perpendicular and the positive direction of the xaxis. In an actual application, for an arbitrary pixel, the angle θtakes a plurality of discrete values ranging from 0 to π such that R ofexpression (1) is calculated for each value of θ. R is also discretizedand its frequency of occurrence is attained in the form of voting forall pixels so that (R, θ) having the highest number of votes obtained isdetected as a straight line component.

A plurality of trigonometric functions are stored as constants in amemory for use in calculation later. Or, in the conventional method inwhich the value of each trigonometric function is directly calculatedusing, e.g., the CORDIC method, even when the number of multiplicationsis reduced by a clever algorithm, a considerable amount ofmultiplications are still necessary. Furthermore, it is not practical toprovide a multiplier for each of the multiplications, namely, themultiplier is to be sequentially used. This is cause of hindrance to thehigh-speed operation. Additionally, since an arbitrary input is assumedin a multiplier, even when a value at a digit place of binary input datais zero, a partial product is uselessly calculated for the digit place.When there is used the method in which all of the results of productsbetween data items and trigonometric function values are stored in thememory, although the arithmetic unit can be easily designed, the memorycapacity is increased and hence the chip size becomes larger.

Moreover, to count the votes for the discrete (R, θ), there is requireda large volume of memory.

DISCLOSURE OF INVENTION

It is therefore an object of the present invention to provide a discretecosine high-speed arithmetic unit, a high-speed Hartley transformarithmetic unit, and a high-speed Hough transform circuit in whichconsidering that each trigonometric function value is constant, topossibly minimize the number of non-zero coefficients in the binaryvalue obtained by expanding the trigonometric function value, the valueis beforehand recoded into a redundant binary representation of{-1,0,+1}. The resultant values are shifted such that a pair of non-zerocoefficients is optimally grouped. For each digit position, associateddata pairs are subjected to addition or subtraction according to thesigns of the coefficients. Moreover, the resultant values are shifted tobe aligned to a fixed position and are then inputted to a group ofadders to thereby obtain partial products therebetween, therebyattaining the sum of the partial products. In consequence, thearithmetic units and circuit above are efficiently configured in acompact structure to operate at a high speed.

Since the number of non-zero coefficients is reduced in the constant andthe pair of non-zero coefficient values are grouped for each digitposition to commonly effect the addition in an optimal manner, thenumber of adders is decreased and the number of stages of gates is alsominimized.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a configuration diagram of a discrete cosine high-speedarithmetic unit of the present invention.

FIG. 2 is a DCT/IDCT calculation expressed in matrix form.

FIG. 3 is a table showing binary expansion values of seven cosineconstants, values obtained by canonically recoding the expansion values,and values attained by multiplying the values by the square root of two.

FIG. 4 is a diagram showing a combination of variable pairs tobeforehand accomplish addition or subtraction for the anterior additionand shift input positions for the posterior addition.

FIG. 5 is a circuit diagram for the calculation of the one-dimensionalDCT/IDCT of the present invention.

FIG. 6 is a one-digit circuit diagram for implementing x_(i) -x_(j) byuse of simple gates in place of adders.

FIG. 7 is a one-digit circuit diagram of a redundant binary adderemployed according to the present invention.

FIG. 8 is an explanatory diagram of a method of achieving thetwo-dimensional DCT/IDCT by reducing the two-dimensional form into aone-dimensional form and a method of directly calculating thetwo-dimensional DCT/IDCT.

FIG. 9 is a table showing binary expansion values of cosα×cosβ attwo-dimensional points, i.e., 4×4 points and recoded values thereof.

FIG. 10 is a DCT at two-dimensional 4×4 points in matrix representation.

FIG. 11 is an IDCT at two-dimensional points 4×4 in matrixrepresentation.

FIG. 12 is a diagram showing a method in which the DCT/IDCT attwo-dimensional 4×4 points is directly calculated without reducing thetwo-dimensional form into a one-dimensional form and a combination ofpairs of variables to beforehand accomplish addition or subtraction forthe anterior addition and shift input positions for the posterioraddition.

FIG. 13 is a circuit construction diagram of the present invention inwhich the DCT at two-dimensional 4×4 points is directly calculatedwithout reducing the two-dimensional form into a one-dimensional form.

FIG. 14 is a configuration diagram of a chip in which a DCT/IDCThigh-speed arithmetic unit of the present invention is incorporated.

FIG. 15 is a construction diagram of a high-speed Hartley transformarithmetic unit as an embodiment of the present invention.

FIG. 16 is an explanatory diagram showing a 16-point Hartley transformin matrix representation.

FIG. 17 is an explanatory diagram showing a re-arranged 16-point Hartleytransform in matrix representation.

FIG. 18 is an explanatory diagram showing a state in which constants aredeveloped into binary values.

FIG. 19 is a circuit block diagram showing a product sum circuit.

FIG. 20 is a circuit block diagram showing an initial stage of abutterfly arithmetic circuit.

FIG. 21 is a circuit block diagram showing the second and subsequentstages of the butterfly arithmetic circuit.

FIG. 22 is a construction diagram of a high-speed Hough transformcircuit as an embodiment of the present invention.

FIG. 23 is an explanatory diagram showing a geometric relationship of a16-directional Hough transform.

FIG. 24 is an explanatory diagram related to a binary expansion ofconstant values of cosine functions and grouping of common parts.

BEST MODE FOR CARRYING OUT THE INVENTION

A Description will first be given of an 8-point discrete cosinetransform (to be abbreviated as DCT herebelow). Assuming that input dataand calculation data are respectively x_(k) and X_(n), the formula ofDCT is expressed as follows: ##EQU1## Moreover, the formula of inverseDCT (to be abbreviated as IDCT herebelow) is expressed as: ##EQU2##where, ##EQU3## It is assumed g(i)=cos(πi/16), equation (1) can berepresented in matrix equation as shown in FIG. 2. Additionally,equation (2) can be represented as a matrix attained by transposing therows and columns of the matrix in FIG. 2.

Prior to conducting the DCT calculation, consider now the followingformula of product sum: ##EQU4## Assuming ##EQU5## in equation (4), thefollowing relationship results: ##EQU6## where, a_(k),i ε{-1,0,+1}. Ifthere exist a pair of coefficients a_(k),i and a_(j),i having arelationship of a_(k),i =|a_(j), then the following equation is obtained_(i) |=1 for distinct k and j,

    (x.sub.k a.sub.k,i +x.sub.j a.sub.j,i)2.sup.i =(x.sub.k ±x.sub.j)2.sup.i(7)

Furthermore, if there exist a pair of coefficients a_(k),i and a_(j),mhaving a relationship of a_(k),i =|a_(j),m |=1 for distinct i and m, thefollowing equation results.

    (x.sub.k a.sub.k,i +x.sub.j a.sub.j,m ·2.sup.m-i)2.sup.i =(x.sub.k +x.sub.j ·2.sup.m-i)2.sup.i                      (8)

For the product sum x_(k) ·a_(k) +x_(j) ·a_(j), if there exist aplurality of pairs of coefficients satisfying equation (7) or (8), thesums of and differences between x_(k) and x_(j) are calculated accordingto the principle represented by equation (7) and the sums of anddifferences between one of x_(k) and x_(j) and results obtained bymultiplying the other one thereof by n-th power of two (n=(m-i) digitshift) are calculated according to the principle designated by equation(8), and then the obtained results are respectively shifted to digitpositions to satisfy the condition above to be then added to each other,thereby reducing the number of calculations for the partial product sum.

In addition, appropriately using a relationship

    2.sup.i -2.sup.0 =2.sup.i-1 +2.sup.i-2 +. . . 2.sup.0      (9)

called a canonical recode in which the i non-zero coefficients can bedecreased to two non-zero coefficients, there are conducted shiftoperations to thereby increase the pairs of coefficients satisfyingequation (7) or (8).

Next, a, description will be given of a method in which the amount ofcalculations of DCT partial product sum is minimized according to theseprinciples. First, FIG. 3 shows a binary expansion values of sevencosine constants g(i) up to the 16-th digit. Also shown are thecanonical recode values (the number of non-zero coefficients is changedfrom 59 to 42 through the recoding process). However, -1 is representedby applying an overline to 1. Additionally, since g(4) multiplied by thesquare root of two results in a simple value of one, there are alsoshown the results obtained by multiplying the constants by the squareroot of two (the number of non-zero coefficients is changed from 46 to43). As above, applying an appropriate fixed value to the cosineconstant, the number of non-zero coefficients is increased or decreasedafter the recoding process, and hence existence of an optimal solutioncan be expected. This naturally depends on the number of figuresassumed. In this case, considering also the inverse DCT calculation(inversely, the constant is divided in the IDCT by the fixed value usedin the multiplication) since the digit place can be easily alignedthrough the shift operation, it can be noted that either one of g(i) andthe result obtained by multiplying g(i) by the square root of two issignificant. However, this is not the case if either the DCT calculationor the IDCT calculation is utilized. In this embodiment, descriptionwill be primarily given on the assumption that the value is multipliedby the square root of two.

In the DCT calculation, g(2k+1) and g(2k) are grouped, namely, to appearrespectively in an odd-numbered row and an even-numbered row as shown inFIG. 2. Furthermore, since g(k) appears as having the opposite sign incolumns i and j, the difference u_(k) =x_(i) -x_(j) is beforehandcalculated between x_(i) and x_(j). In this connection, for example,assuming u₁ =x₀ -x₇, u₃ =x₁ -x₆, u₅ =x₂ -x₅, and u₇ =x₃ -x₄ in the firstrow, the total sum ##EQU7## (rounded up at the 13-th digit below thedecimal point) is calculated as follows. First, as shown in FIG. 4, u₁+u₃, u₁ -u₃, and u₅ +u₇ are calculated in the prior addition.Thereafter, a shift and input process is carried out for the results ofadditions and for the digit places of appearances respectively of u₁,u₃, u₅, and u₇, thereby achieving the posterior addition for these itemsat once. Next, assuming u₂ =(x₀ +x₇)-(x₃ +x₄) and u₆ =(x₁ +x₆)-(x₂ +x₅)in the second row, the total sum u₂ ·g(2)+u₆ ·g(6)=

1. 010011101000u₂ +0. 100010101001u₆

is calculated as follows. First, as shown in FIG. 4, u₂ and u6 areobtained to be added to each other in the anterior addition. Thereafter,a shift and input process is carried out for the results of additionsand for the digit places of appearances respectively of u₂ and u6,thereby achieving the posterior addition for these items at once. Inaddition, for the 0-th and 4-th rows, assuming u4=(x₀ +x₇)+(x₃ +x₄) andu₄ '=-(x₁ +x₆)-(x₂ +x₅), it is only necessary to first calculate u4 andu₄ ' to obtain the total sum (u₄ +u₄ ') ·g(4)=u₄ +u₄ '. For theremaining rows, i.e., rows 3 and 6 and rows 5 and 7, the additions canbe achieved by a configuration substantially similar to that used forthe first and second rows. In this situation, providing selectors s1,s3, s5, s7, s2, s3, s4, and s4' as shown in FIG. 5, the hardware of theadder section can be commonly used such that the calculation issequentially achieved for each of the odd-numbered rows and for each ofthe even-numbered rows. It is to be appreciated that the hardware of theadder section is not commonly utilized, but may be disposed in aparallel fashion. That is, to concurrently calculate the total sum foreach row of the matrix representation shown in FIG. 2, adder hardware isindependently arranged for each row.

In FIG. 5, it is only necessary to appropriately select the type ofadders 120 to 126 and 220 to 241, for example, a full adder andcarry-propagation-free adder may be chosen for each adder. Toparticularly increase the processing speed, a carry-propagation-freeadder should be selected. In such a case, there is required sections 210and 211 to transform the result of total sum into a binaryrepresentation. Moreover, the carry-propagation-free adders areclassified into a carry save type and a redundant binary type. Either ofthese types may be used. In relation to this embodiment, a descriptionwill now be given of a circuit construction method adopting acharacteristic unique to the redundant binary adder. Since thecarry-propagation-free adder includes the same circuit for each digitplace, it is only necessary to consider the circuit for an arbitrarydigit position. First, since the calculation circuits 100 to 103 foru_(k) =x_(i) -x_(j) associated with the respective digit places conductoperations of 0-0=0, 0-1=-1, 1-0=+1, and 1-1=0, the system can beconfigured with simple gate circuits shown in FIG. 6 without using anyadder circuit. For the calculation circuits 104 to 107 for u_(k) =X_(i)-x_(j), it is only necessary to consider u_(k) =x_(i) +x_(j)=x_(i-)(-x_(j)). The value of -x_(i) can be obtained through {(invertedx_(j))+1} using the representation of two's complement. The invertedvalue of x_(j) is represented by an overline applied to x_(j). Since theadditions in the second and subsequent stages are the redundant binaryrepresentation of {+1,0,-1}, the circuit shown in FIG. 7 is employed asthe redundant binary adder circuit for each digit place. The result ofthe total sum obtained in the redundant binary representation isconverted into a binary value. Since a redundant binary value can bedecomposed into positive and negative binary values, the convertercircuits 210 and 211 can be easily constituted with subtracters. Thesubtracter may be provided with a dedicated borrow-look-back circuitwhich corresponds to a carry-lookahead circuit in an adder.

Next, the IDCT calculation will be described. As shown in FIG. 2, theIDCT is attained by transposing the rows and columns of the DCT in thematrix representation. The difference therebetween resides in that theg(k) appearances are grouped according to odd and even values of k inDCT, but g(k) occurs for all values of k in the DCT. However, whileg(1), g(3), g(5) and g(7) appear in the 0-th column, the same items withthe opposite sign occur in the seventh column. In 0-th and seventhcolumns, g(4), g(2), g(4), and g(6) respectively appear with the samesign. Additionally, while g(3), -g(7), -g(1), and -g(5) occur in thefirst column, the same items with the opposite sign appear in the sixthcolumn. In first and sixth columns, g(4), g(6), -g(4), and -g(2)respectively appear with the same sign. While g(5), -g(1), g(7), andg(3) occur in the second column, the same items with the opposite signappear in the fifth column. In second and fifth columns, g(4), -g(6),-g(4), and g(2) respectively appear with the same sign. In addition,while g(7), -g(5), g(3), and -g(1) occur in the third column, the sameitems with the opposite sign appear in the fourth column. In the thirdand fourth columns, g(4), -g(2), g(4), and -g(6) respectively appearwith the same sign. In consequence, for the total sum of the groupedvalues of g(k) related to an odd number of k, by effecting addition ofthe values having opposite signs for the preceding and succeedingcolumns, the IDCT results can be attained for two columns at the sametime. Adding the circuits 240 and 241 and selectors thereof and theselectors of input data items 110 to 117 to FIG. 5, the hardware of theDCT/IDCT circuit can be commonly used.

Arranging the DCT/IDCT block diagram of FIG. 5 described above for eachrow of FIG. 2 in a parallel fashion for each calculation, there isprovided the configuration diagram of FIG. 1 showing the discrete cosinehigh-speed arithmetic unit of the present invention. In short, eightoriginal/calculation data items are simultaneously inputted to thearithmetic unit. In an anterior adder section 10, there are beforehandcalculated the sums of and/or differences between the input data items(through the recoding process of cosine constant values) and the sums ofand/or differences between the values obtained by multiplying the inputdata items by a fixed value (through the position shift operation ofcosine constant values). Next, in the posterior adder section 20, theresultant values from the anterior adder section 10 are shifted for thealignment to a predetermined digit position to be then inputted to thegroup of adders, thereby calculating partial products. Obtaining thetotal sum of the partial products, eight output (calculation/original)data items are simultaneously obtained. In the conventionalmultiplier-based system, there are required 12 multiplications(11×12=132 additions) and 29 additions. This requires 161 adders intotal. Additionally, when one addition is denoted as one stage, thereexist 14 stages of addition. According to the method of the presentinvention, there are required 116 (=29×4) adders in total and fivestages of addition. In consequence, while decreasing the number ofadders to about 2/3 of that of the conventional system, the processingspeed can be advantageously increased to three times that of theconventional system.

The DCT/IDCT described above is related to one-dimensional calculationsand is primarily adopted for the compression/decompression of voices. Toapply the arithmetic unit to the compression/decompression of atwo-dimensional image expressed with coordinates (x,y), the image isdecomposed into two one-dimensional elements associated withx-directional and y-directional scans. Namely, the results of firstone-dimensional elements related to the x-directional scan areprovisionally stored in a random access memory (RAM). The rows andcolumns are transposed to be inputted to second one-dimensional elementsrelated to the y-directional scan for the calculation. The obtainedresults are related to the two-dimensional image. In contrast to theconventional method described above, description will be given of amethod of the present invention in which the calculation is directlyachieved in the two-dimensional manner without decomposing the image inthe one-dimensional elements. For simplicity of explanation, descriptionwill be given of the two-dimensional case of 4×4 points. This will beeasily expanded to a case of 8×8 points. In the two-dimensional case,constant values of cosα·cosβ are required to be calculated. If the fixedvalues respectively of cos and cosβ are separately acquired for thecalculation, multiplications will be required. However, in the case of4×4 points, when the six combinations of the multiplied values areobtained in advance, the multiplications between the constants areunnecessary. Additionally, the storage operation of data in the RAMrequired in the conventional method in which the image is decomposedinto two one-dimensional elements becomes unnecessary and hence theprocessing speed is increased. Assume f(i)=cos(πi/8). Then, the matricesof two-dimensional DCT and IDCT for 4×4 points are as shown in FIGS. 10and 11. However, in this connection, the terms of coefficients which canbe shifted are substantially unnecessary for the explanation and henceare not shown. Incidentally, matrix F^(t) indicates a transposed matrixof matrix F.

According to the expression of the two-dimensional DCT of FIG. 10, itcan be seen from FIG. 12 that the values are attained up to the 16-thdigit below the floating decimal point and the pairs of additions areattained as (f(1)f(2),f(2)f(3)) and (f(1)f(1),f(3)f(3)), and the singleaddition is as f(1)f(3) and f(2)f(2). As in the one-dimensional case,there are selected pairs of additions for the anterior addition and theresults are subjected to a shift and input process according to theassociated digit places so as to be added to each other at a time in theposterior addition 20. FIG. 13 briefly shows the block diagram of thetwo-dimensional DCT for 4×4 points. In this situation, all data itemsare assumed to be calculated in a parallel manner. Therefore, 4×4=16data items xij are stored beforehand in buffer memories or registers.The calculation of x_(ij) -x_(kl) 60 is carried out by gates (for onedigit position) of a circuit 600 without using any adder. Thecalculation of x_(ij) +x_(kl) 61 is performed by gates (for one digitposition) of a circuit 610 without using any adder. However, such addersutilized thereafter as adders 50 and 51 are redundant binary adders ofwhich a one-digit circuit is as shown in FIG. 7. The additions in thegroup of adders are classified into three types 70 to 72 and there exist16 blocks in total. Since the equation of the two-dimensional IDCT for4×4 points is as shown in FIG. 11, the hardware of the DCT can becommonly used in a method similar to that applied to the one-dimensionalcase. According to the method of the present invention, since thetwo-dimensional item need not be decomposed into two one-dimensionalelements, it is unnecessary to store the temporal data items in thetransposing RAM, leading to an advantage that the calculation isexecuted without interruption and the processing speed is increased.However, the number of adders required is increased to about three timesthat used in the case in which the one-dimensional decomposing operationis effected.

FIG. 14 shows an example of a chip system 300 adopting the DCT/IDCThigh-speed arithmetic unit of the present invention. An image is storedin a buffer memory section 320 such as a frame memory or a register inwhich a parallel access operation can be achieved. A DCT/IDCT section310 simultaneously obtains all data items including 8×8=64 data itemsnecessary for the operation and then outputs results of the operation toa quantizing section 330. The data is then compressed by avariable-length encoding section 340 to be sent to a transmission routeor stored on a storage media (350). Conversely, compressed data suppliedfrom the transmission route or storage media to the system 300 isdecompressed by a variable-length decoder section 340. The data is thenrestored into the original calculation data by an inverse quantizingsection 330 to be inversely transformed by IDCT 310 into the originalimage data. The image data is displayed as a picture via the buffermemory 320.

Description will now be given of a 16-point discrete Hartley transform(DHT). Assuming input data and calculation data to be x_(k) and X_(n),the expression of the DHT is represented by equation (10). Moreover, theinverse equation is represented by equation (11). ##EQU8## where,##EQU9##

Since equations (10) and (11) are of the same form when themultiplication of 1/16 (easily implemented by a shift operation) isignored, only equation (10) will be described in the followingparagraphs. Using the equation (13) shown below, equation (10) can berepresented in matrix notation as shown in FIG. 16. Moreover, the rowsand columns of the matrix notation of FIG. 16 can be rearranged as shownin FIG. 17.

    p=cos(π/8)+sin(π/8)=cos(π/8)+cos(3π/8),

    q=cos(5π/8)+sin(5π/8)=-cos(3π/8)+cos(π/8)      (13)

Description will next be given of a method in which the amount ofcalculations of product sums is reduced in the DHT according to theprinciples of equations (4) to (9) described above.

In the calculation of a DHT, the values of p, q, and square root of twoappear as multiplication terms in groups as shown in FIG. 17.Furthermore, since these items are of the opposite signs and occur intwo or more columns il, . . . , ij, the sums of and differences betweenxil, . . . , xij and wk=xil±. . . ±xij are calculated beforehand in asystematic fashion.

Additionally, in relation to a row in which, for example, X1 is to becalculated, assuming w1=x0-x8, w2=x4-x12, w3=x2-x10, w4=x1-x9,w5=x3-x11, w6=x5-x13, w7=x7-x15, and up=w4+w5, uq=w6-w7, the calculationof the total sum ##EQU10## is classified into three calculation groupsof w1+w2, w3·(2), and up·p+up·q. Calculation of w1+w2 is accomplished byan adder 560, w3·(2) is calculated by a multiplier 520, and up·q+up·q iseffected by a product sum unit 570.

First, in the calculation of

    up·p+uq·q=up·1. 0100111001111011+uq·0. 1000101010001011

(rounded off at the 17-th digit below binary point), 2up+uq, uq-up, anduq+up are calculated in an anterior addition 555 and then a shift andinput process is achieved for the results and digit places ofoccurrences of up and uq, thereby conducting posterior additions 552 to554 for the resultant values at once as shown in FIGS. 18 and 19. In thecalculation of w3(2)=w3·1.0110101000000101, w3 is shifted to be suppliedto the digit position of occurrence of 1 so as to calculate the resultby the group of adders (multiplier 520).

First of all, the calculating circuit of each digit place of wk=xi-xjcan be configured with simple gate circuits shown in FIG. 20 withoutusing any adder circuit because the calculations are 0-0=0, 0 -1=-1,1-0=+1, and 1-1=0.

The calculation circuit of each digit place of wk=xi+xj can beconfigured only in consideration of wk=xi+xj=xi-(-xj). The value of -xjcan be attained by {(inverse value of xj)+1} according to therepresentation of two's complement. The inverse value of xj isrepresented by drawing an overline over xj. Furthermore, since theadditions in the second and subsequent stages are the redundant binaryrepresentation of {+1,0,-1}, a basic circuit 571 shown in FIG. 21 isadopted as the redundant binary adder circuit for each digit place.

Moreover, in the Hartley transform circuit, the operation of xy, aso-called butterfly operation, is often conducted. Therefore, the gateconfiguration is partially shared between the arithmetic circuits of x+yand x-y to resultantly obtain a butterfly arithmetic circuit 570. Theresult of the total sum represented in the redundant binaryrepresentation is converted into a binary value. Since a redundantbinary value can be decomposed into positive and negative binary values,the converter circuit can be easily constituted with subtracters. Thesubtracter may be provided with a borrow-look-back dedicated circuitwhich corresponds to a carry-lookahead circuit of an adder.

As a result of the description above, there is constructed a DHTarithmetic unit shown in FIG. 15. Assuming that 16 data items x0 to x15are inputted thereto in a parallel manner, a butterfly operation isperformed for the data items by eight circuits 560. Since the outputtherefrom is represented in redundant binary representation, thebutterfly operations in the second and subsequent stages areaccomplished entirely by the circuits 570. According to the groups shownin FIG. 17, the operation is subdivided into a portion including X0, X8,X4, and X12 in which the calculation is performed only by the group ofbutterfly arithmetic units, and a portion including X2, X10, X6, and X14in which the calculation is carried out by a circuit 520 additionallydisposed to multiply the square root of two by an input value thereto,and a portion Xi (i is an odd number) in which the calculation isexecuted by a circuit 550 added to the system.

The flow of calculation steps is as shown in FIG. 15, which requires thefollowing arithmetic units including 28 butterfly arithmetic units, fourmultipliers 520, and four product sum units 550. When the calculationabove is conducted by the conventional multiplier-based system, 15adders are used in the multiplier, since two adders are necessary in thebutterfly arithmetic unit and 31 adders are required in the product sumunit 550, it can be appreciated that 8 ×2+4×15 +4 ×31=240 adders arerequired. In addition, when one addition is regarded as one stage, eightstages of addition are employed. In contrast therewith, according to thepresent invention, only 108 (=20×2+7×4+10×4) adders are necessary intotal for each stage and only six stages of addition are required. Thisconsequently leads to an advantageous effect that while the number ofadders is reduced to be equal to or less than half that of addersemployed in the conventional system, the processing speed is increasedto about 1.3 times that of the prior art. Since the number ofmultiplications is abruptly increased in a DHT for 32 points or more,the advantageous effect of the increased processing speed is much moreenhanced.

Next, description will be given of a method in which the amount ofcalculations of partial product sum is decreased according to theprinciples of equations (4) to (9) already described above. Descriptionwill be given of the Hough transform in a case in which the angle θ isdivided into 16 directions. Furthermore, only eight directions in therange of angle 0 to π/2 will be described. Since the minimization of theamount of calculations can be achieved by almost the same manner alsofor the remaining eight directions in the range of angle π/2 to π inwhich only the sign is partially changed, description thereof and adiagram related thereto will be omitted.

Prior to the realization of calculations of the Hough transform,consider first a formula of product sum represented by equation (14).

    R=x·C.sub.x +Y·C.sub.y                   (14)

Assuming now ##EQU11## the following equation results. ##EQU12##

In the equations above, however, there is assumed a condition of cx, i,cy, i ε {-1,0,+1}. If there exists a pair of coefficients of cx, i andcy, p for which cx, i=|cy,p|=1, equation (17) results:

    (x·cx,i+y·cy,p·2[-i)2i=x±y·2p-i)2.sup.i                                                        (17)

For the product sum of x·cx+y·cy, if there exist a plurality of pairs ofcoefficients which satisfy equation (17), the sums of and differencesbetween x and y·2^(n) (n=shifted by (p-i) digit places) are calculatedaccording to the principle designated by equation (17) and then theresultant values are shifted respectively to the digit positions tosatisfy the condition above so as to be added to each other, therebydecreasing the number of calculations of partial product sums.

In addition, appropriately employing the relationship of equation (18)called a canonical recoding in which i non-zero coefficients can bereduced to two non-zero coefficients, there are conducted shiftoperations to thereby increase the pairs of coefficients satisfyingequation (17) as follows:

    2.sup.i -2.sup.0 =2.sup.i-1 +2.sup.i-2 +2.sup.0            (18)

FIG. 24 shows the recoded results of eight cos values (rounded off atthe 17-th digit below binary point). Moreover, according to therelationship between the equation of the Hough transform and equation(17), the common pairs of x and y for the addition and subtraction canbe arranged as enclosed in a rectangle in FIG. 24.

In a case of, for example, θ=3π/16, since R xcos(3π/16)+ycos(5π/16), thecalculation can be conducted in groups of x+y for the first, fourth, and14-th digit places below binary point, x-2y for the eighth and 11-thdigit places below binary point, and x +2y for the 16-th digit placebelow binary point in the anterior addition. In addition, these anterioraddition groups can also be commonly adopted for other values of θ. Inthe conventional method based on multipliers, the values of cos(3π/16)and cos(5π/16) stored in a table are read therefrom to be respectivelymultiplied by x and y. Consequently, the common anterior addition stepsabove have been impossible.

Values of R to be discretized are generated by an R decoder. Moreover,the R decoder is directly connected to a voting counter such that thevalue of the counter associated with votes is decoded and an operationof +1 is carried out. The R decoder and voting counter are arranged foreach θ.

As a result of the description above, there is configured the Houghtransform circuit shown in FIG. 22. Assume that coordinate data (x,y) ofan arbitrary pixel is inputted thereto. The data is subjected toadditions and subtractions in the circuit 560. Since the outputstherefrom are represented in redundant binary representation, theoperations in the second and subsequent stages are accomplished entirelyby a circuit 571.

In the multiplier-based system of the prior art, since 15 adders arenecessary for one multiplier (in a 16-bit processing system), there arerequired 8×(2×15+1)=248 adders. Additionally, when one addition isregarded as one stage, five stages of addition are used (under acondition that eight multipliers are adopted in parallel). According tothe present invention, there are required in total only 24 addersincluding the circuit 571 (the adders of the circuit 560 are simplegates and are not included). Moreover, only four stages of addition needto be employed. Therefore, paying attention only to the Hough transformcalculating section, there is attained an advantage that while reducingthe number of adders to 1/10 or less as compared with the conventionalsystem, the processing speed is increased to about 1.25 times that ofthe prior art. Furthermore, in accordance with the present invention,since the read operation of the coefficient table is unnecessary and theupdate of the voting counter is executed without interruption, it can beexpected that the processing speed is actually increased to at least tentimes that of the conventional system. In addition, since only onemultiplier is adopted and/or adders with carry propagation are utilizedin many usual cases, the processing speed can be expectedly increased toat least 1000 times that of the ordinary conventional cases.

As an application example of the present invention, there can beconsidered, for example, an application in which such items primarilyincluding straight line components as Chinese characters are to berecognized. Moreover, in a case in which when the directions are fixedto about 16 directions, there can be possibly adopted a utilization modein which the straight line components are first detected through acoarse detection step to be then sieved for a fine detection. The sievedpixels can be further processed for a precise determination of thedirection such that the operation efficiently proceeds to the subsequentwork processes. For the fine determination of direction, the systemconducts calculation of equation (19) according to the addition theoremof trigonometric functions, ##EQU13## where, α is an angle with theprecision of 16 divisions and β indicates an angle with a finerdivision. Consequently, the calculation of equation (19) is preciselyachieved using multipliers in a method similar to the conventionalmethod. Assume X=(xcosα+ysinα) and Y=(xsinα+ycosα). It can beappreciated that these values can be immediately calculated by thehardware of the present invention.

The present invention is not limited to the discrete cosine transformand Hartley transform, but can be expanded generally to trigonometricfunctions. Therefore, the present invention is applicable also to thediscrete Fourier transform and its associated operations (such asWavelet transform). In addition, the present invention can be appliednot only to general transforms using trigonometric functions, such asHough transforms but also to Radon transforms which are obtained bygeneralizing Hough transforms. Moreover, the trigonometric functions canbe expanded to general periodic functions. Additionally, the applicationrange can be expanded to a case in which either one of the operations isa product sum operation of constants. Dimensions can be increased fromtwo dimensions to three or more dimensions, and the discretizationpoints can be increased to more than eight. The shift operation of thepresent invention is fixed as a predetermined operation. When the systemto conduct the shift operation is configured with shifters, there isobtained a variable construction and hence the application range isexpanded. Although a large number of adders are employed, the basiccircuit of an arbitrary one digit place has a regular repetitivestructure and hence the design scale can be easily increased.

INDUSTRIAL APPLICABILITY

According to the present invention, there is attained an advantage thatthe number of gate stages is considerably reduced and the calculationspeed is increased in the DCT/IDCT, Hartley transform, and Houghtransform. Moreover, since the DCT/IDCT hardware is almost all commonlyused, when the basic element is repeatedly used to hold (processingspeed)×(area)=(constant) for the high-speed calculation, it is possibleto minimize the chip area.

We claim:
 1. A discrete cosine high-speed arithmetic unit for carryingout partial sum of products for discrete cosine transform comprising:aplurality of first units for calculating in parallel sums of and/ordifferences between a plurality of values obtained by multiplying saidplurality of input variables by a constant; and a processing unitincluding a plurality of shift units for shifting outputs from saidplurality of first units by respectively predetermined numbers ofdigit-shifts and a plurality of second units for calculatingconcurrently sums of outputs from said plurality of shift units.
 2. Adiscrete cosine high-speed arithmetic unit for carrying out partial sumof products for discrete cosine transform comprising:a plurality offirst units for pre-calculating in parallel sums of and/or differencesbetween a plurality of input variables or sums of and/or differencesbetween a plurality of values obtained by multiplying said plurality ofinput variables by a constant; and a processing unit including aplurality of shift units for shifting outputs from said plurality offirst units by respectively predetermined numbers of digit-shifts and aplurality of second units for post-calculating concurrently sums ofoutputs from said plurality of shift units.
 3. A discrete cosinehigh-speed arithmetic unit according to claim 1 or 2,wherein saidplurality of first units includes selecting means for selecting, for ani-th column and a j-th column of n-1=i+j in a point discrete cosinetransform formula, results of calculation of sums of and differencesbetween data of said i-th column and said j-th column immediately aftersaid data are inputted thereto in case of a transform of said n-pointdiscrete cosine transform formula and for selecting the inputted data ofan i-th row and a j-th row itself in case of an inverse transform ofn-1=1+j in said n-point discrete cosine transform formula, wherebyhardware is shared between said transform and said inverse transform ofsaid n-point discrete cosine transform formula.
 4. A discrete cosinehigh-speed arithmetic unit according to claim 3, including a pluralityof gate circuits, each of which is a circuit for transforming a sum inan initial data inputting stage into a subtraction to achieve 1-1=0,1-0=1, 0-1=-1, and 0-0=0, and redundant binary adders for subsequentadditions of outputs from said plurality of gate circuits.
 5. A discretecosine high-speed arithmetic unit according to claim 1 or 2, including aplurality of gate circuits, each of which is a circuit for transforminga sum in an initial data inputting stage into a subtraction to achieve1-1=0, 1-0=1, 0-1=-1, and 0-0=0, and redundant binary adders forsubsequent additions of outputs from said plurality of gate circuits. 6.A discrete cosine high-speed arithmetic unit according to claim 1 or 2,wherein a plurality of cosine coefficients multiplied by an appropriatefixed value of a matrix of rows and columns for discrete cosinetransform are beforehand multiplied by said appropriate fixed value sothat the number of non-zero coefficients after a recoding operation ofsaid plurality of cosine coefficients of said matrix of rows and columnsfor discrete cosine transform is less than that of initial coefficientswhich would result if said plurality of cosine coefficients of saidmatrix of rows and columns for discrete cosine transform were notmultiplied by said appropriate fixed value.
 7. A discrete cosinehigh-speed arithmetic unit according to claim 6, including a pluralityof gate circuits, each of which is a circuit for transforming a sum inan initial data inputting stage into a subtraction to achieve 1-1=0,1-0=1, 0-1=-1, and 0-0=0, and redundant binary adders for subsequentadditions of outputs from said plurality of gate circuits.
 8. A dataprocessing system or processor, comprising:an input/output port to inputand to output multimedia information; and a buffer memory to buffertherein data of the multimedia information for conducting parallel inputand output operations of data via the buffer memory, wherein theoperation of data of the multimedia information is carried out by adiscrete cosine high-speed arithmetic unit according to claim 1 or
 2. 9.A data storage system for the data processing system or processor ofclaim 8, whrein the operation of data of the multimedia information iscarried out in a realtime fashion and the data storage system has avirtual storage capacity which is larger than an actual storage capacitythereof by two to three orders of magnitude.
 10. A data processingsystem or processor according to claim 8, including a plurality of gatecircuits, each of which is a circuit for transforming a sum in aninitial data inputting stage into a subtraction to achieve 1-1=0, 1-0=1,0-1=-1, and 0-0=0, and redundant binary adders for subsequent additionsof outputs from said plurality of gate circuits.
 11. A data storagesystem according to claim 9, including a plurality of gate circuits,each of which is a circuit for transforming a sum in an initial datainputting stage into a subtraction to achieve 1-1=0, 1-0=1, 0-1=-1, and0-0=0, and redundant binary adders for subsequent additions of outputsfrom said plurality of gate circuits.
 12. A data processing system orprocessor according to claim 8, wherein the output multimediainformation includes voice, image and/or code information.
 13. Ahigh-speed Hartley transform arithmetic unit comprising:a plurality offirst units for calculating in parallel sums of and/or differencesbetween a plurality of values obtained by multiplying said plurality ofinput variables by a constant, a processing unit including a pluralityof shift units for shifting outputs from said plurality of first unit byrespectively predetermined numbers of digit-shifts and a plurality ofsecond units for calculating concurrently sums of outputs from saidplurality of shift units.
 14. A high-speed Hartley transform arithmeticunit according to claim 13, including a plurality of gate circuits, eachof which is a circuit for transforming a sum in an initial datainputting stage into a subtraction to achieve 1-1=0, 1-0=1, 0-1=-1, and0-0=0, and redundant binary adders for subsequent additions of outputsfrom said plurality of gate circuits.
 15. A high-speed Hough transformcircuit comprising:a plurality of first units for calculating inparallel sums of and/or differences between a plurality of inputcoordinate data of pixels or sums of and/or differences between aplurality of values obtained by multiplying said plurality of inputcoordinate data of pixels by a constant, a processing unit including aplurality of shift units for shifting outputs from said plurality offirst units by respectively predetermined numbers of digit-shifts and aplurality of second units for calculating concurrently sums of outputsfrom said plurality of shift units, and results from outputs of saidplurality of second units of said processing unit are outputted anddecoded in parallel manner, thereby configuring a voting counter.
 16. Ahigh-speed Hough transform circuit according to claim 15, including aplurality of gate circuits, each of which is a circuit for transforminga sum in an initial data inputting stage into a subtraction to achieve1-1=0, 1-0=1, 0-1=-1, and 0-0=0, and redundant binary adders forsubsequent additions of outputs from said plurality of gate circuits.